Intro To Assembly

Compiling

So first off, what is assembly code? Assembly code is the code that actually runs on your computer by the processor. For instance take some C code:

#include <stdio.h> void main(void) { puts("Hello World!"); }

That code isn't ran. Thing is that code is compiled into assembly code, which looks like this:

0000000000001135 <main>: 1135: 55 push rbp 1136: 48 89 e5 mov rbp,rsp 1139: 48 8d 3d c4 0e 00 00 lea rdi,[rip+0xec4] # 2004 <_IO_stdin_used+0x4> 1140: e8 eb fe ff ff call 1030 <puts@plt> 1145: 90 nop 1146: 5d pop rbp 1147: c3 ret 1148: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax*1+0x0] 114f: 00

The purpose of languages like C, is that we can program without having to really deal with assembly code. We write code that is handed to a compiler, and the compiler takes that code and generates assembly code that will accomplish whatever the C code tells it to. Then the assembly code is what is actually ran on the processor. Since this is the code that is actually ran, it helps to understand it. Also since most of the time we are handed compiled binaries we only have the assembly code to work from. However we have tools such as Ghidra that will take compiled assembly code and give us a view of what it thinks the C code that the code was compiled from looks like, so we don't need to read endless lines of assembly code.

Also with assembly code, there is a lot of different architectures. Different types of processors can run different types of assembly code architectures. The two we are dealing with the most here will be 64 bit, and 32 bit ELF (Executable and Linkable Format). I will often call these two things x64 and x86.

Stacks

Now one of the most common memory regions you will be dealing with is the stack. It is where local variables in the code are stored.

For instance, in this code the variable x is stored in the stack:

#include <stdio.h> void main(void) { int x = 5; puts("hi"); }

Specifically we can see it is stored on the stack at rbp-0x4.

``0000000000001135 <main>: 1135: 55 push rbp 1136: 48 89 e5 mov rbp,rsp 1139: 48 83 ec 10 sub rsp,0x10 113d: c7 45 fc 05 00 00 00 mov DWORD PTR [rbp-0x4],0x5 1144: 48 8d 3d b9 0e 00 00 lea rdi,[rip+0xeb9] # 2004 <_IO_stdin_used+0x4> 114b: e8 e0 fe ff ff call 1030 <puts@plt> 1150: 90 nop 1151: c9 leave 1152: c3 ret 1153: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0] 115a: 00 00 00 115d: 0f 1f 00 nop DWORD PTR [rax]

Now values on the stack are moved on by either pushing them onto the stack, or popping them off. That is the only way to add or remove values from the stack (it is a LIFO data structure). However we can reference values on the stack.

The exact bounds of the stack is recorded by two registers, rbp and rsp. The base pointer rbp points to the bottom of the stack. The stack pointer rsp points to the top of the stack.